15. Quality: Programmatic Assessment 1
Quality Programatic Assessment 1
In the above video, I state that having the country column as the data type object (string) is fine, while I argue that state should be the category data type. This topic deserves a little bit more discussion.
state is categorical because its values are a finite set of options without order. country , for all intents and purposes, also has a finite set of values and therefore could be argued to be of categorical type as well. It seems there isn't much freedom of values in country to deserve classifying it as a string.
So why use object here for the data type for country ? Well, country does still have a lot of values. Categorical data with tons of categories isn't that useful. Another reason for using object here is situational, i.e., it depends on the context in which you'd like to use the country column. In this dataset, all of the clinical trial patients are from the United States, so there are no advantages gained from switching the data type from object to category . The country column won't be used for analysis.
A more general scenario outside of this dataset is as follows. Say you had one to a few observations from each country, it would probably be best to treat country like a string and group observations on a larger unit, like world_region (Africa, Asia, Central America, etc.). If you had a lot of observations from a few countries, like test scores from students sampled in a handful of countries, making country categorical would be more appropriate.
The answer to a lot of questions in data analysis and data science is "it depends." This is what makes wrangling tricky sometimes since you have to understand the context of your data to make the best decision. Data scientists in a workplace should often consult with others on the team who know the data context best, or who will use the results of analysis later, like business analysts or product owners.
Task
(OPTIONAL) Quality: Programmatic Assessment
Task Feedback:
Thank you.
Workspace
This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.
Workspace Information:
- Default file path:
- Workspace type: jupyter
- Opened files (when workspace is loaded): n/a